In recent years, spammers are now trying to obfuscate their intents by introducing hybrid spam e-mail combining both image and text parts, which is more challenging to detect in comparison to e-mails containing text or image only. The motivation behind this research is to design an effective approach filtering out hybrid spam e-mails to avoid situations where traditional text-based or image-baesd only filters fail to detect hybrid spam e-mails. To the best of our knowledge, a few studies have been conducted with the goal of detecting hybrid spam e-mails. Ordinarily, Optical Character Recognition (OCR) technology is used to eliminate the image parts of spam by transforming images into text. However, the research questions are that although OCR scanning is a very successful technique in processing text-and-image hybrid spam, it is not an effective solution for dealing with huge quantities due to the CPU power required and the execution time it takes to scan e-mail files. And the OCR techniques are not always reliable in the transformation processes. To address such problems, we propose new late multi-modal fusion training frameworks for a text-and-image hybrid spam e-mail filtering system compared to the classical early fusion detection frameworks based on the OCR method. Convolutional Neural Network (CNN) and Continuous Bag of Words were implemented to extract features from image and text parts of hybrid spam respectively, whereas generated features were fed to sigmoid layer and Machine Learning based classifiers including Random Forest (RF), Decision Tree (DT), Naive Bayes (NB) and Support Vector Machine (SVM) to determine the e-mail ham or spam.
translated by 谷歌翻译
图像垃圾邮件威胁检测一直是互联网惊人扩展的流行研究领域。这项研究提出了一个可解释的框架,用于使用卷积神经网络(CNN)算法和可解释的人工智能(XAI)算法检测垃圾邮件图像。在这项工作中,我们使用CNN模型分别对图像垃圾邮件进行了分类,而hoc XAI方法包括局部可解释的模型不可思议的解释(Lime)和Shapley添加说明(SHAP),以提供有关黑手盒CNN的决定的解释关于垃圾邮件图像检测的模型。我们在6636图像数据集上训练,然后评估拟议方法的性能,包括垃圾邮件图像和从三个不同的公开电子邮件Corpora收集的垃圾邮件图像和正常图像。实验结果表明,根据不同的性能指标,提出的框架实现了令人满意的检测结果,而独立模型的XAI算法可以为不同模型的决策提供解释,以比较未来的研究。
translated by 谷歌翻译
过程变化和设备老化对电路设计师构成了深刻的挑战。如果不对变化对电路路径的延迟的影响进行精确理解,无法正确估计避免定时违规行为的后卫带。对于先进的技术节点,此问题加剧了,其中晶体管尺寸达到原子水平,并且已建立的边缘受到严格限制。因此,传统的最坏情况分析变得不切实际,导致无法忍受的性能开销。相反,过程变化/衰老感知的静态时序分析(STA)为设计师提供了准确的统计延迟分布。然后可以有效地估计小但足够的时正时标志。但是,这样的分析是昂贵的,因为它需要密集的蒙特卡洛模拟。此外,它需要访问基于机密的物理老化模型来生成STA所需的标准细胞库。在这项工作中,我们采用图形神经网络(GNN)来准确估计过程变化和设备衰老对电路中任何路径延迟的影响。我们提出的GNN4REL框架使设计师能够执行快速准确的可靠性估计,而无需访问晶体管模型,标准细胞库甚至STA;这些组件均通过铸造厂的训练纳入GNN模型中。具体而言,对GNN4REL进行了针对工业14NM测量数据进行校准的FinFET技术模型的培训。通过我们对EPFL和ITC-99基准以及RISC-V处理器进行的广泛实验,我们成功估计了所有路径的延迟降级(尤其是在几秒钟内),平均绝对误差降至0.01个百分点。
translated by 谷歌翻译
可靠性的关键问题是电路设计师的巨大关注之一。驱动力是晶体管老化,取决于操作电压和工作负载。在设计时,很难估算在终生期间保持衰老效果的近距离护罩。这是因为铸造厂不共享其基于物理的校准模型,该模型由高度机密的技术和材料参数组成。但是,对降解的不受监控但必要的高估相当于绩效下降,这是可以预防的。此外,这些基于物理学的模型在计算方面非常复杂。在设计时间为数百万个单个晶体管建模的成本显然是过高的。我们提出了经过培训的机器学习模型的革命前景,以复制基于物理的模型,以免披露机密参数。出于设计优化的目的,电路设计人员可以完全访问这种有效的解决方法。我们证明了模型通过对一个电路的数据进行训练并将其成功应用于基准电路的能力。平均相对误差高达1.7%,速度高达20倍。电路设计师有史以来首次可以易于使用高精度老化模型,这对于有效的设计至关重要。这项工作是跨越铸造厂和电路设计师之间宽阔鸿沟的方向的一个有希望的步骤。
translated by 谷歌翻译
Recent work has shown the benefits of synthetic data for use in computer vision, with applications ranging from autonomous driving to face landmark detection and reconstruction. There are a number of benefits of using synthetic data from privacy preservation and bias elimination to quality and feasibility of annotation. Generating human-centered synthetic data is a particular challenge in terms of realism and domain-gap, though recent work has shown that effective machine learning models can be trained using synthetic face data alone. We show that this can be extended to include the full body by building on the pipeline of Wood et al. to generate synthetic images of humans in their entirety, with ground-truth annotations for computer vision applications. In this report we describe how we construct a parametric model of the face and body, including articulated hands; our rendering pipeline to generate realistic images of humans based on this body model; an approach for training DNNs to regress a dense set of landmarks covering the entire body; and a method for fitting our body model to dense landmarks predicted from multiple views.
translated by 谷歌翻译
To generate high quality rendering images for real time applications, it is often to trace only a few samples-per-pixel (spp) at a lower resolution and then supersample to the high resolution. Based on the observation that the rendered pixels at a low resolution are typically highly aliased, we present a novel method for neural supersampling based on ray tracing 1/4-spp samples at the high resolution. Our key insight is that the ray-traced samples at the target resolution are accurate and reliable, which makes the supersampling an interpolation problem. We present a mask-reinforced neural network to reconstruct and interpolate high-quality image sequences. First, a novel temporal accumulation network is introduced to compute the correlation between current and previous features to significantly improve their temporal stability. Then a reconstruct network based on a multi-scale U-Net with skip connections is adopted for reconstruction and generation of the desired high-resolution image. Experimental results and comparisons have shown that our proposed method can generate higher quality results of supersampling, without increasing the total number of ray-tracing samples, over current state-of-the-art methods.
translated by 谷歌翻译
In this paper we explore the task of modeling (semi) structured object sequences; in particular we focus our attention on the problem of developing a structure-aware input representation for such sequences. In such sequences, we assume that each structured object is represented by a set of key-value pairs which encode the attributes of the structured object. Given a universe of keys, a sequence of structured objects can then be viewed as an evolution of the values for each key, over time. We encode and construct a sequential representation using the values for a particular key (Temporal Value Modeling - TVM) and then self-attend over the set of key-conditioned value sequences to a create a representation of the structured object sequence (Key Aggregation - KA). We pre-train and fine-tune the two components independently and present an innovative training schedule that interleaves the training of both modules with shared attention heads. We find that this iterative two part-training results in better performance than a unified network with hierarchical encoding as well as over, other methods that use a {\em record-view} representation of the sequence \cite{de2021transformers4rec} or a simple {\em flattened} representation of the sequence. We conduct experiments using real-world data to demonstrate the advantage of interleaving TVM-KA on multiple tasks and detailed ablation studies motivating our modeling choices. We find that our approach performs better than flattening sequence objects and also allows us to operate on significantly larger sequences than existing methods.
translated by 谷歌翻译
In this chapter, we review and discuss the transformation of AI technology in HCI/UX work and assess how AI technology will change how we do the work. We first discuss how AI can be used to enhance the result of user research and design evaluation. We then discuss how AI technology can be used to enhance HCI/UX design. Finally, we discuss how AI-enabled capabilities can improve UX when users interact with computing systems, applications, and services.
translated by 谷歌翻译
Deep neural networks (DNNs) are vulnerable to a class of attacks called "backdoor attacks", which create an association between a backdoor trigger and a target label the attacker is interested in exploiting. A backdoored DNN performs well on clean test images, yet persistently predicts an attacker-defined label for any sample in the presence of the backdoor trigger. Although backdoor attacks have been extensively studied in the image domain, there are very few works that explore such attacks in the video domain, and they tend to conclude that image backdoor attacks are less effective in the video domain. In this work, we revisit the traditional backdoor threat model and incorporate additional video-related aspects to that model. We show that poisoned-label image backdoor attacks could be extended temporally in two ways, statically and dynamically, leading to highly effective attacks in the video domain. In addition, we explore natural video backdoors to highlight the seriousness of this vulnerability in the video domain. And, for the first time, we study multi-modal (audiovisual) backdoor attacks against video action recognition models, where we show that attacking a single modality is enough for achieving a high attack success rate.
translated by 谷歌翻译
Modern deep neural networks have achieved superhuman performance in tasks from image classification to game play. Surprisingly, these various complex systems with massive amounts of parameters exhibit the same remarkable structural properties in their last-layer features and classifiers across canonical datasets. This phenomenon is known as "Neural Collapse," and it was discovered empirically by Papyan et al. \cite{Papyan20}. Recent papers have theoretically shown the global solutions to the training network problem under a simplified "unconstrained feature model" exhibiting this phenomenon. We take a step further and prove the Neural Collapse occurrence for deep linear network for the popular mean squared error (MSE) and cross entropy (CE) loss. Furthermore, we extend our research to imbalanced data for MSE loss and present the first geometric analysis for Neural Collapse under this setting.
translated by 谷歌翻译